Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 79
Filter
1.
Conference Proceedings - IEEE SOUTHEASTCON ; 2023-April:610-617, 2023.
Article in English | Scopus | ID: covidwho-20242090

ABSTRACT

We demonstrate the feasibility of a generalized technique for semantic deduplication in temporal data domains using graph-based representations of data records. Structured data records with multiple timestamp attributes per record may be represented as a directed graph where the nodes represent the events and the edges represent event sequences. Edge weights are based on elapsed time between connecting nodes. In comparing two records, we may merge these directed graphs and determine a representative directed acyclic graph (DAG) inclusive of a subset of nodes and edges that maintain the transitive weights of the original graphs. This DAG may then be evaluated by weighting elapsed time equivalences between records at each node and measuring the fraction of nodes represented in the DAG versus the union of nodes between the records being compared. With this information, we establish a duplication score and use a specified threshold requirement to assert duplication. This method is referred to as Temporal Deduplication using Directed Acyclic Graphs (TD:DAG). TD:DAG significantly outperformed established ASNM and ASNM+LCS methods for datasets rep-resenting two disparate domains, COVID-19 government policy data and PlayStation Network (PSN) trophy data. TD:DAG produced highly effective and comparable F1 scores of 0.960 and 0.972 for the two datasets, respectively, versus 0.864/0.938 for ASNM+LCS and 0.817/0.708 for ASNM. © 2023 IEEE.

2.
EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of System Demonstrations ; : 35-42, 2023.
Article in English | Scopus | ID: covidwho-20234954

ABSTRACT

In recent years, COVID-19 has impacted all aspects of human life. As a result, numerous publications relating to this disease have been issued. Due to the massive volume of publications, some retrieval systems have been developed to provide researchers with useful information. In these systems, lexical searching methods are widely used, which raises many issues related to acronyms, synonyms, and rare keywrds. In this paper, we present a hybrid relation retrieval system, CovRelex-SE, based on embeddings to provide high-quality search results. Our system can be accessed through the following URL: https://www.jaist.ac.jp/is/labs/nguyen-lab/systems/covrelex-se/. © 2023 Association for Computational Linguistics.

3.
Technology Application in Tourism in Asia: Innovations, Theories and Practices ; : 295-309, 2022.
Article in English | Scopus | ID: covidwho-2326083

ABSTRACT

Social media has shown to affect tourist activity and spending. However, research related to travel intentions from a large-scale perspective has remained very limited in Indonesia. This research presents an empirical case study using the text mining process on Indonesian domestic tourists' travel intentions to fill in the missing gap. Text classification was used to categorize whether a tweet includes travel intentions or not by concentrating on tourism-related tweet data from Twitter before and after the COVID-19 pandemic. The process of entity recognition was also used to classify the entities in the Tweet. This study showed that the Indonesian intention to travel was 13.08 percent higher than before the pandemic of COVID-19. Moreover, it was also found that interest in adventure activities increased by 581.25 percent and honeymoon trips by 175 percent. Surprisingly, 92 percent of short-stay intentions concluded in this research. However, Indonesian tourists who want to take a long tour are rising by 215.18 percent. This study's findings also show Indonesian tourists' choice to fly to many destinations, such as Bali, the Riau Islands, and Bandung. A more successful Indonesian tourism promotion strategy is expected to develop as a result of this research. Referring to the study findings, it appears that the current model of promotion is relatively distinct from the existing one. The promotional activities that emphasize and focus on 1) sustainable growth, 2) improved productivity, 3) investment innovation and digital transformation, 4) morals, culture, and social responsibility, and 5) technological cooperation has become increasingly important to be incorporated in various programs by The Ministry of Tourism of Indonesia. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022.

4.
International Journal of Semantic Computing ; 2023.
Article in English | Scopus | ID: covidwho-2318669

ABSTRACT

Deduplication is a key component of the data preparation process, a bottleneck in the machine learning (ML) and data mining pipeline that is very time-consuming and often relies on domain expertise and manual involvement. Further, temporal data is increasingly prevalent and is not well suited to traditional similarity and distance-based deduplication techniques. We establish a fully automated, domain-independent deduplication model for temporal data domains, known as TemporalDedup, that infers the key attribute(s), applies a base set of deduplication techniques focused on value matches for key, non-key, and elapsed time, and further detects duplicates through inference of temporal ordering requirements using Longest Common Subsequence (LCS) for records of a shared type. Using LCS, we split each record's temporal sequence into constrained and unconstrained sequences. We flag suspicious (errant) records that are non-adherent to the inferred constrained order and we flag a record as a duplicate if its unconstrained order, of sufficient length, matches that of another record. TemporalDedup was compared against a similarity-based Adaptive Sorted Neighborhood Method (ASNM) in evaluating duplicates for two disparate datasets: (1) 22,794 records from Sony's PlayStation Network (PSN) trophy data, where duplication may be indicative of cheating, and (2) emergency declarations and government responses related to COVID-19 for all U.S. states and territories. TemporalDedup (F1-scores of 0.971 and 0.954) exhibited combined sensitivities above 0.9 for all duplicate classes whereas ASNM (0.705 and 0.732) exhibited combined sensitivities below 0.2 for all time and order duplicate classes. © 2023 World Scientific Publishing Company.

5.
Data Technologies and Applications ; 57(2):222-244, 2023.
Article in English | Web of Science | ID: covidwho-2309391

ABSTRACT

Purpose The purpose of this study was to design a multitask learning model so that biomedical entities can be extracted without having any ambiguity from biomedical texts. Design/methodology/approach In the proposed automated bio entity extraction (ABEE) model, a multitask learning model has been introduced with the combination of single-task learning models. Our model used Bidirectional Encoder Representations from Transformers to train the single-task learning model. Then combined model's outputs so that we can find the verity of entities from biomedical text. Findings The proposed ABEE model targeted unique gene/protein, chemical and disease entities from the biomedical text. The finding is more important in terms of biomedical research like drug finding and clinical trials. This research aids not only to reduce the effort of the researcher but also to reduce the cost of new drug discoveries and new treatments. Research limitations/implications As such, there are no limitations with the model, but the research team plans to test the model with gigabyte of data and establish a knowledge graph so that researchers can easily estimate the entities of similar groups. Practical implications As far as the practical implication concerned, the ABEE model will be helpful in various natural language processing task as in information extraction (IE), it plays an important role in the biomedical named entity recognition and biomedical relation extraction and also in the information retrieval task like literature-based knowledge discovery. Social implications During the COVID-19 pandemic, the demands for this type of our work increased because of the increase in the clinical trials at that time. If this type of research has been introduced previously, then it would have reduced the time and effort for new drug discoveries in this area. Originality/value In this work we proposed a novel multitask learning model that is capable to extract biomedical entities from the biomedical text without any ambiguity. The proposed model achieved state-of-the-art performance in terms of precision, recall and F1 score.

6.
7th Arabic Natural Language Processing Workshop, WANLP 2022 held with EMNLP 2022 ; : 1-10, 2022.
Article in English | Scopus | ID: covidwho-2290872

ABSTRACT

Named Entity Recognition (NER) is a well-known problem for the natural language processing (NLP) community. It is a key component of different NLP applications, including information extraction, question answering, and information retrieval. In the literature, there are several Arabic NER datasets with different named entity tags;however, due to data and concept drift, we are always in need of new data for NER and other NLP applications. In this paper, first, we introduce Wassem, a web-based annotation platform for Arabic NLP applications. Wassem can be used to manually annotate textual data for a variety of NLP tasks: text classification, sequence classification, and word segmentation. Second, we introduce the COVID-19 Arabic Named Entities Recognition (CAraNER) dataset extracted from the Arabic Newspaper COVID-19 Corpus (AraNPCC). CAraNER has 55,389 tokens distributed over 1,278 sentences randomly extracted from Saudi Arabian newspaper articles published during 2019, 2020, and 2021. The dataset is labeled by five annotators with five named-entity tags, namely: Person, Title, Location, Organization, and Miscellaneous. The CAraNER corpus is available for download for free. We evaluate the corpus by finetuning four BERT-based Arabic language models on the CAraNER corpus. The best model was AraBERTv0.2-large with 0.86 for the F1 macro measure. © 2022 Association for Computational Linguistics.

7.
1st International Conference on Machine Learning, Computer Systems and Security, MLCSS 2022 ; : 301-306, 2022.
Article in English | Scopus | ID: covidwho-2294226

ABSTRACT

The COVID-19 pandemic has been accompanied by such an explosive increase in media coverage and scientific publications that researchers find it difficult to keep up. So we are working on COVID-19 dataset on Omicron variant to recognise the name entity from a given text. We collect the COVID related data from newspaper or from tweets. This article covered the name entity like COVID variant name, organization name and location name, vaccine name. It include tokenisation, POS tagging, Chunking, levelling, editing and for run the program. It will help us to recognise the name entity like where the COVID spread (location) most, which variant spread most (variant name), which vaccine has been given (vaccine name) from huge dataset. In this work, we have identified the names. If we assume unemployment, economic downfall, death, recovery, depression, as a topic we can identify the topic names also, and in which phase it occurred. © 2022 IEEE.

8.
Int J Mol Sci ; 23(23)2022 Nov 29.
Article in English | MEDLINE | ID: covidwho-2296973

ABSTRACT

The body of scientific literature continues to grow annually. Over 1.5 million abstracts of biomedical publications were added to the PubMed database in 2021. Therefore, developing cognitive systems that provide a specialized search for information in scientific publications based on subject area ontology and modern artificial intelligence methods is urgently needed. We previously developed a web-based information retrieval system, ANDDigest, designed to search and analyze information in the PubMed database using a customized domain ontology. This paper presents an improved ANDDigest version that uses fine-tuned PubMedBERT classifiers to enhance the quality of short name recognition for molecular-genetics entities in PubMed abstracts on eight biological object types: cell components, diseases, side effects, genes, proteins, pathways, drugs, and metabolites. This approach increased average short name recognition accuracy by 13%.


Subject(s)
Artificial Intelligence , Data Mining , Data Mining/methods , PubMed , Databases, Factual , Proteins
9.
13th IEEE International Conference on Knowledge Graph, ICKG 2022 ; : 56-63, 2022.
Article in English | Scopus | ID: covidwho-2258490

ABSTRACT

While manual analysis of news coverage is difficult and time consuming, methods in natural language processing can be used to uncover otherwise hidden semantics. This work analyses more than 370,000 news articles to explore connections and trends in business decisions and their financial impact during the COVID-19 pandemic. Topic modelling, sentiment analysis and named entity recognition methods are used to identify connections between the articles and the financial performance of selected companies or industries. This report sets out the results of the individual natural language processing methods and the resulting analysis with financial data. Interesting contrasting topics in the media can be filtered out that are associated with the companies with the highest or lowest positive sentiment. This information could be useful to companies to gain an understanding of topics that are currently treated favourably or unfavourably by the media and hence assist with communication strategies and competitive intelligence. © 2022 IEEE.

10.
2022 IEEE International Conference on Big Data, Big Data 2022 ; : 5173-5181, 2022.
Article in English | Scopus | ID: covidwho-2248652

ABSTRACT

Clinical Cohort Studies (CCS), such as randomized clinical trials, are a great source of documented clinical research. Ideally, a clinical expert inspects these articles for exploratory analysis ranging from drug discovery for evaluating the efficacy of existing drugs in tackling emerging diseases to the first test of newly developed drugs. However, more than 100 articles are published daily on a single prevalent disease like COVID-19 in PubMed. As a result, it can take days for a physician to find articles and extract relevant information. Can we develop a system to sift through these articles faster and document the crucial takeaways from each of these articles? In this work, we propose CCS Explorer, an end-to-end system for relevance prediction of sentences, extractive summarization, and patient, outcome, and intervention entity detection from CCS. CCS Explorer is packaged in a web-based graphical user interface where the user can provide any disease name. CCS Explorer then extracts and aggregates all relevant information from articles on PubMed based on the results of an automatically generated query produced on the back-end. For each task, CCS Explorer fine-tunes pre-trained language representation models based on transformers with additional layers. The models are evaluated using two publicly available datasets. CCS Explorer obtains a recall of 80.2%, AUC-ROC of 0.843, and an accuracy of 88.3% on sentence relevance prediction using BioBERT and achieves an average Micro F1-Score of 77.8% on Patient, Intervention, Outcome detection (PIO) using PubMedBERT. Thus, CCS Explorer can reliably extract relevant information to summarize articles, saving time by ~660×. © 2022 IEEE.

11.
Expert Systems with Applications ; 223, 2023.
Article in English | Scopus | ID: covidwho-2263399

ABSTRACT

Because of the frequent occurrence of chronic diseases, the COVID-19 pandemic, etc., online health expert question-answering (HQA) services have been unable to cope with the rapidly increasing demand for online consultations. Building a virtual health assistant based on medical named entity recognition (NER) can effectively assist with the consultation process, but the unstandardized expressions within HQA text pose a serious challenge for medical NER tasks. The main goal of this study is to propose a novel deep medical NER approach based on a collaborative decision strategy (CDS), i.e., co_decision_NER (CDN), that can identify standard and nonstandard medical entities in the HQA context. We collected 10,000 question–answer pairs from HaoDF, extracted medical entities from 15 entity categories, and used a CDS to fuse the advantages of different NER models. Ultimately, CDN achieved a performance (precision = 84.50%, recall = 84.30%, F1 = 84.40%) that was significantly better than that of the state-of-the-art (SOTA) method. Our empirical analysis suggests that the entity types Disease (DIS), Sign (SIG), Test (TES), Drug (DRU), Surgery (SUR), Precaution (PRE), and Region (REG) can be most easily expressed arbitrarily in the doctor–patient interaction scenario of HQA services. In addition, CDN can identify not only standard but also nonstandard medical entities, effectively alleviating the severe out-of-vocabulary (OOV) problem faced by HQA services when performing medical NER tasks. The core contribution of this study is the development of a novel neural network model fusion algorithm that can improve the performance of entity recognition in medical domain-specific tasks. © 2023 Elsevier Ltd

12.
IET Cyber-Physical Systems: Theory and Applications ; 2023.
Article in English | Scopus | ID: covidwho-2244409

ABSTRACT

With the rapid development of biomedical research and information technology, the number of clinical medical literature has increased exponentially. At present, COVID-19 clinical text research has some problems, such as lack of corpus and poor annotation quality. In clinical medical literature, there are many medical related semantic relationships between entities. After the task of entity recognition, how to further extract the relationships between entities efficiently and accurately becomes very critical. In this study, a COVID-19 clinical trial data relationship extraction model based on deep learning method is proposed. The model adopts MPNet model, bidirectional-GRU (BiGRU) network, MAtt mechanism and Conditional Random Field inference layer integration architecture and improves the problem that static word vector cannot represent ambiguity through pre-trained language model. BiGRU network is used to replace the current Bi directional long short term memory structure and simplify the network structure of Long Short Term Memory to improve the training efficiency of the model. Through comparative experiments, the proposed method performs well in the COVID-19 clinical text entity relation extraction task. © 2023 The Authors. IET Cyber-Physical Systems: Theory & Applications published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.

13.
Studies in Computational Intelligence ; 1060:267-278, 2023.
Article in English | Scopus | ID: covidwho-2239163

ABSTRACT

From the outset of the COVID-19 pandemic, social media has provided a platform for sharing and discussing experiences in real time. This rich source of information may also prove useful to researchers for uncovering evolving insights into post-acute sequelae of SARS-CoV-2 (PACS), commonly referred to as Long COVID. In order to leverage social media data, we propose using entity-extraction methods for providing clinical insights prior to defining subsequent downstream tasks. In this work, we address the gap between state-of-the-art entity recognition models and the extraction of clinically relevant entities which may be useful to provide explanations for gaining relevant insights from Twitter data. We then propose an approach to bridge the gap by utilizing existing configurable tools, and datasets to enhance the capabilities of these models. Code for this work is available at: https://github.com/VectorInstitute/ProjectLongCovid-NER. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

14.
J Ambient Intell Humaniz Comput ; : 1-15, 2021 Jun 10.
Article in English | MEDLINE | ID: covidwho-2243986

ABSTRACT

Real-time data processing and distributed messaging are problems that have been worked on for a long time. As the amount of spatial data being produced has increased, coupled with increasingly complex software solutions being developed, there is a need for platforms that address these needs. In this paper, we present a distributed and light streaming system for combating pandemics and give a case study on spatial analysis of the COVID-19 geo-tagged Twitter dataset. In this system, three of the major components are the translation of tweets matching with user-defined bounding boxes, name entity recognition in tweets, and skyline queries. Apache Pulsar addresses all these components in this paper. With the proposed system, end-users have the capability of getting COVID-19 related information within foreign regions, filtering/searching location, organization, person, and miscellaneous based tweets, and performing skyline based queries. The evaluation of the proposed system is done based on certain characteristics and performance metrics. The study differs greatly from other studies in terms of using distributed computing and big data technologies on spatial data to combat COVID-19. It is concluded that Pulsar is designed to handle large amounts of long-term on disk persistence.

15.
Data Technologies and Applications ; 2022.
Article in English | Scopus | ID: covidwho-2232143

ABSTRACT

Purpose: The purpose of this study was to design a multitask learning model so that biomedical entities can be extracted without having any ambiguity from biomedical texts. Design/methodology/approach: In the proposed automated bio entity extraction (ABEE) model, a multitask learning model has been introduced with the combination of single-task learning models. Our model used Bidirectional Encoder Representations from Transformers to train the single-task learning model. Then combined model's outputs so that we can find the verity of entities from biomedical text. Findings: The proposed ABEE model targeted unique gene/protein, chemical and disease entities from the biomedical text. The finding is more important in terms of biomedical research like drug finding and clinical trials. This research aids not only to reduce the effort of the researcher but also to reduce the cost of new drug discoveries and new treatments. Research limitations/implications: As such, there are no limitations with the model, but the research team plans to test the model with gigabyte of data and establish a knowledge graph so that researchers can easily estimate the entities of similar groups. Practical implications: As far as the practical implication concerned, the ABEE model will be helpful in various natural language processing task as in information extraction (IE), it plays an important role in the biomedical named entity recognition and biomedical relation extraction and also in the information retrieval task like literature-based knowledge discovery. Social implications: During the COVID-19 pandemic, the demands for this type of our work increased because of the increase in the clinical trials at that time. If this type of research has been introduced previously, then it would have reduced the time and effort for new drug discoveries in this area. Originality/value: In this work we proposed a novel multitask learning model that is capable to extract biomedical entities from the biomedical text without any ambiguity. The proposed model achieved state-of-the-art performance in terms of precision, recall and F1 score. © 2022, Emerald Publishing Limited.

16.
2022 IEEE MIT Undergraduate Research Technology Conference, URTC 2022 ; 2022.
Article in English | Scopus | ID: covidwho-2230986

ABSTRACT

This paper presents a named entity recognition system for the specific domain of Vietnamese COVID-19 news articles. By incorporating manually selected and domain-specific features into a simple deep learning architecture, the system can identify a wide range of custom named entities relevant in the context of COVID-19 and future epidemics. Using high-dimensional embedding vectors in combination with part-of-speech tags and additional features, the system achieves an F score of about 90.41%, surpassing or coming close to results by other models that are more complicated or pre-Trained and fine-Tuned. © 2022 IEEE.

17.
Appl Psychol Health Well Being ; 2022 Jun 10.
Article in English | MEDLINE | ID: covidwho-2231456

ABSTRACT

During the COVID-19 pandemic, quarantine has been implemented as a physical distancing measure to reduce the risk of transmission. However, no studies have examined the relationship between quarantine and daily affective experiences. Few studies have examined the individual-level factors that may alleviate or strengthen the negative impact of quarantine on daily affective experiences. To this end, we conducted a diary study by comparing the affective experiences of people in quarantine with those of people not subject to quarantine. There were 201 participants in the study. After the pretest collecting responses on demographic information and entity theory of emotion, the participants completed a daily questionnaire measuring their daily positive and negative affect for 14 consecutive days. The results of hierarchical linear modeling showed that the participants in the quarantine condition reported less daily positive affect than those in the social interaction condition. We found that when the participants under quarantine believed more strongly that their emotions could not be changed, they reported a higher level of daily negative affect. These findings demonstrate the role of entity theory of emotion in understanding daily negative affect during quarantine.

18.
2022 IEEE MIT Undergraduate Research Technology Conference, URTC 2022 ; 2022.
Article in English | Scopus | ID: covidwho-2223158

ABSTRACT

This paper presents a named entity recognition system for the specific domain of Vietnamese COVID-19 news articles. By incorporating manually selected and domain-specific features into a simple deep learning architecture, the system can identify a wide range of custom named entities relevant in the context of COVID-19 and future epidemics. Using high-dimensional embedding vectors in combination with part-of-speech tags and additional features, the system achieves an F score of about 90.41%, surpassing or coming close to results by other models that are more complicated or pre-Trained and fine-Tuned. © 2022 IEEE.

19.
2022 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2022 ; : 2266-2273, 2022.
Article in English | Scopus | ID: covidwho-2223088

ABSTRACT

We gain insight to the COVID-19 pandemic response by the various U.S. states through analysis of open source emergency declaration, mitigation, and response policy data. We propose ASNM + POD, a Partial Ordering Detection extension to the Adaptive Sorted Neighborhood Method to identify redundancies and implied temporal ordering requirements to understand how various U.S. states respond to COVID-19. We further strengthen the well-established ASNM entity matching method and address key limitations of its Longest Common Subsequence extension (ASNM + LCS) through detection of all temporal order requirements. Partial order requirements are determined probabilistically through empirical review of all records' time-ordered event sequences. We demonstrate effectiveness against a COVID-19 U.S. state policy dataset comprised of daily time-series data pulled from February and October 2022, where attributes are partially and variably populated. ASNM + POD yielded an F1 of 0.995 and an MCC of 0.985, significantly outperforming both ASNM and ASNM + LCS with F1/MCC improvements of 22%/50% and 15%/37%, respectively. Finally, we highlight the limited consensus on policies enacted, the variability in timelines of policy activations/deactivations, and activity at and after the two-year mark. © 2022 IEEE.

20.
2022 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2022 ; : 2274-2280, 2022.
Article in English | Scopus | ID: covidwho-2223066

ABSTRACT

Toward efficient learning of massive publications during the COVID-19 pandemic, we propose a pipeline, Knowledge Extraction for COVID-19 Publications (KEP), that aims at automatic extraction and representation of key knowledge from user-interested publications. The first version, KEP-1.0, has been developed and published on the Python Package Index (PyPI) (URL: https://pypi.org/project/KEP/). In this first release, knowledge about key topics, disease discussions, and location mentions for each publication is provided. KEP-1.0 not only extracts relevant knowledge but, more importantly, emphasizes the top discussed entities and presents visualizable plots, including bar graphs and word clouds. This allows a rapid preliminary understanding of the main discussions in the publication from these three aspects. Moreover, an enhanced TF-IDF algorithm, the weighted TF-IDF, targeting the publication topic identification purpose, has been proposed and evaluated. The pipeline is fully open-sourced and customizable. KEP-1.0 is ready for use in its current form or to be embedded into existing literature platforms. This pipeline is designed for COVID-related publications, but it has the potential to benefit similar knowledge extraction tasks for other topics of interest with a rapidly increasing number of publications. © 2022 IEEE.

SELECTION OF CITATIONS
SEARCH DETAIL